Development and validation of a novel CD4+ T cell‐related gene signature to detect severe COVID‐19

It’s well known that the coronavirus disease 2019 (COVID19) has posed great harm to world public health security these years.1 Thoughmost patients were mild, some developed severe symptoms, especially for the elder.2 Therefore, a predictionmodel for severeCOVID-19 can identify potential severe patients and provide targeted treatment timely. The COVID-19 severity depended significantly on the host’s immune responses.3 CD4 T cell exhaustion and decline in function are the critical immune mechanisms in the deterioration of COVID-19.4,5 In our study, we build a CD4 T cell-related gene signature to provide greater insight into the immune mechanisms behind COVID-19, and it is also beneficial for the establishment of a prediction model for severe COVID-19 and clinical therapy. A flowchart of this study is shown in Figure S1. Through filtering single-cell RNA-sequencing data of GSE163668, we acquired gene expression profiles of 100776 cells from 27 COVID-19 samples (Figure 1A). Using the first 2000 variable genes we identified 31 cell clusters (Figure 1B) with identity annotations and clusters 5, 6, 9, 12, 18 and 30 were classified as CD4 T cells (Figure 1C). Across the clusters, there were 126 genes differentially expressed, which were identified as CD4 T cell marker genes (Table S1). The frequency of CD4 T cell has an obvious reduction in COVID-19 and decreased more in the severe group (Figure 1D). Hence, the GSE157103 dataset was downloaded for further analysis. The degree of CD4 T cell infiltration is reduced in severe COVID-19 (Figure 1E,F), and the CD4 T infiltration score was correlated with clinical indicators related to disease severity, especially hospital-free days atDay 45 (HFD45; Figure 1G). These outcomes are in agreement with earlier reports6 and indicate that CD4 T cell is instrumental in the disease progression of critical COVID-19.


Dear Editor,
It's well known that the coronavirus disease 2019 (COVID-19) has posed great harm to world public health security these years. 1 Though most patients were mild, some developed severe symptoms, especially for the elder. 2 Therefore, a prediction model for severe COVID-19 can identify potential severe patients and provide targeted treatment timely. The COVID-19 severity depended significantly on the host's immune responses. 3 CD4 + T cell exhaustion and decline in function are the critical immune mechanisms in the deterioration of COVID-19. 4,5 In our study, we build a CD4 + T cell-related gene signature to provide greater insight into the immune mechanisms behind COVID-19, and it is also beneficial for the establishment of a prediction model for severe COVID-19 and clinical therapy.
A flowchart of this study is shown in Figure S1. Through filtering single-cell RNA-sequencing data of GSE163668, we acquired gene expression profiles of 100776 cells from 27 COVID-19 samples ( Figure 1A). Using the first 2000 variable genes we identified 31 cell clusters ( Figure 1B) with identity annotations and clusters 5, 6, 9, 12, 18 and 30 were classified as CD4 + T cells ( Figure 1C). Across the clusters, there were 126 genes differentially expressed, which were identified as CD4 + T cell marker genes (Table  S1). The frequency of CD4 + T cell has an obvious reduction in COVID-19 and decreased more in the severe group ( Figure 1D). Hence, the GSE157103 dataset was downloaded for further analysis. The degree of CD4 + T cell infiltration is reduced in severe COVID-19 ( Figure 1E,F), and the CD4 + T infiltration score was correlated with clinical indicators related to disease severity, especially hospital-free days at Day 45 (HFD45; Figure 1G). These outcomes are in agreement with earlier reports 6 and indicate that CD4 + T cell is instrumental in the disease progression of critical COVID-19.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. To further identify key CD4 + T cell markers, we discovered 69 differentially expressed genes (DEGs) between non-severe and severe patients in GSE157103 ( Figure 1H, Table S2). They have enriched some T cell-related pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. As can be seen in the biological process (BP) analysis, they were mainly connected to the regulation of T-cell activation. Cellular component enrichment was primarily associated with the T cell receptor complex. The molecular function enrichment was related to cytokine binding ( Figure 1I, Table  S3). Seven genes were filed out through the intersection of DEGs and CD4 + T cell markers ( Figure 1J). They were involved in the process of T-cell activation in Gene ontology (GO)-BP analysis ( Figure 1K, Table S4). KEGG pathway analysis showed they were associated with primary immunodeficiency ( Figure 1L, Table S4). Given that severe COVID-19 progresses rapidly and is difficult to rescue, it is necessary to discover an early biomarker or a model for predicting the severity of this disease. Least absolute shrinkage and selection operator regression (LASSO) analyses were conducted, and four genes were filtered out including CD3D, CD3E, LCK and EVL (Figure 2A,B). The risk score is calculated using this formula: risk score = −0.428 * CD3E − 0.693 * EVL − 0.229 *LCK − 0.204 * CD3D. It illustrated that the four key genes were decreased in severe COVID-19 ( Figure 2C,F), and the severe patients had noticeably higher risk scores ( Figure 2G). As can be seen from Figure 2H, this gene signature showed good accuracy. Meanwhile, the risk score was significantly correlated with clinical indicators related to disease severity including D dimer (D-D), acute physiology and chronic health evaluation II score, HFD45, ventilator-free days, C-reactive protein (CRP), procalcitonin (PCT) and ferritin ( Figure 2I). The detailed baseline information is summarized in Table S5. As expected, CD4 + T cell infiltration was fairly correlated with the risk score ( Figure 2J). We then compared the clinical data between the high-and low-risk groups. More severe cases and higher ferritin, CRP, D-D and PCT existed in the highrisk group (Table S6). The high-risk group also had a significant drop in CD4 + T infiltration score ( Figure 2K). The B cells, macrophage and neutrophil infiltration scores were increased ( Figure 2K), which contribute to cytokine storm in COVID-19 patients. It indicated that the patients in the high-risk group had a more significant imbalance of immune status (lymphocytopenia and inflammatory storm). Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is prone to errors during replication. The structural changes caused by the mutation occurred at the immune recognition site like spike protein can result in immune escape. 7 A recent study has raised the possibility that the SARS-CoV-2 in immunocompromised patients may mutate to become less sensitive to neutralising antibodies when prolonged viral replication occurs. 8 In our research, high-risk patients had lower immune scores ( Figure 2L), suggesting they might be immunocompromised and it might make the virus more susceptible to mutate.
To clarify the mechanism, we further performed an enrichment analysis of the risk score-related genes (Table  S7) and found most of them were enriched in mitochondrial BPs and metabolic pathways ( Figure 2M, Table S8). This could be the underlying reason why people with metabolic dysfunction may react more severely to COVID-19. Interestingly, other studies observed Gaucher patients, as a metabolic disease showed protection against expansion of severe form. 9,10 Therefore, the role of metabolic function played in COVID-19 needs more research to verify. We included clinical information and risk scores in univariate and multivariate logistic regression analysis, which showed that risk score and D-D were both independent risk factors for critical cases (Table 1). When subjects with a risk score and D-D levels rose, they were more likely to develop severe disease.
To validate our results, we found the severe group also had high-risk score than the non-severe group in GSE152418 ( Figure 3A). The gene signature also had a positive diagnostic efficacy for severe COVID-19 ( Figure 3B) and was related to CD4 + T cells ( Figure 3C). Finally, we collected blood and clinical information from COVID-19 patients. The detailed demographic characteristics are shown in Table S9. The four key genes were decreased in severe patients significantly in mRNA levels ( Figure 3D-G). They also existed a good diagnostic value for severe COVID-19 ( Figure 3H-K) and a positive association with CD4 + T count ( Figure 3L-O). However, the specific mechanism still needs investigation. The combination between clinical data and gene signature is more convincing to predict the severity of COVID-19.
In summary, we identify a CD4 + T cell-related gene signature through a combination of data from public databases and clinical COVID-19 patient samples, shedding some light on the pathogenic mechanisms of the occurrence of severe COVID-19. More research is warranted on the CD4 + T cell-related immune processes in COVID-19 pathology.  Abbreviations: CI, confidence interval; CRP, C-reactive protein; D-D, D dimer; FIB, fibrinogen; LAC, lactate; OR, odds ratio; PCT, procalcitonin.

A C K N O W L E D G E M E N T S
The authors would like to sincerely thank the Center for Scientific Research of Anhui Medical University for its valuable help in the experiment.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare that they have no competing interests. Da-Wei Zhang 1,2 Fang Li 1,2 Yuan-Yuan Wei 1,2 Lei Hu 1,2 Su-Hong Chen 3 Ming-Ming Yang 1,2 Wen-Ting Zhang 1,2 Guang-He Fei 1,2